skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Bittremieux, Wout"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Marine diazotrophic cyanobacteria play a crucial role in oceanic nitrogen cycling, supporting primary production and ecosystem balance. Crocosphaera watsonii WH8501 exemplifies this ability by temporally separating photosynthesis and diazotrophy to sustain metabolism. To investigate the regulatory mechanisms underlying this process, we employed LC/MS-MS proteomics in a diel culturing experiment, revealing tightly coordinated protein abundance patterns. Our findings showed a sophisticated temporal regulation of metabolic processes categorized within six distinct protein abundance clusters: (1) nitrogen fixation and amino acid biosynthesis proteins peaked during the night, while (2) glycogen metabolism and dark reactions of photosynthesis were most abundant during the night and day-night transition, likely supporting carbon consumption and energy production. Midday (3 and 4) was dominated by proteins related to photosynthesis, cellular division, and lipid synthesis, whereas late-day peaks (5) in peptide biosynthesis may facilitate nitrogenase complex formation. Notably, the day-night transition (6) exhibited fine-tuned coordination of nitrogenase assembly, with FeS cluster proteins preceding peak nitrogenase iron protein abundance, implying a temporally ordered sequence for functional enzyme formation. Within these categories, sharp temporal patterns emerged in iron trafficking to heme and iron cluster biosynthetic systems, consistent with the need to maintain tight control of iron distribution to metalloproteins at each temporal transition. These results highlight the intricate diel regulation that enables Crocosphaera to balance nitrogen fixation and photosynthesis within a single cell. The observed coordination supports the existence of a complex regulatory system ensuring optimal metabolic performance, reinforcing the critical role of temporal control in sustaining these globally significant biological processes. 
    more » « less
    Free, publicly-accessible full text available July 11, 2026
  2. Mass spectrometry is the dominant technology in the field of proteomics, enabling high-throughput analysis of the protein content of complex biological samples. Due to the complexity of the instrumentation and resulting data, sophisticated computational methods are required for the processing and interpretation of acquired mass spectra. Machine learning has shown great promise to improve the analysis of mass spectrometry data, with numerous purpose-built methods for improving specific steps in the data acquisition and analysis pipeline reaching widespread adoption. Here, we propose unifying various spectrum prediction tasks under a single foundation model for mass spectra. To this end, we pre-train a spectrum encoder using de novo sequencing as a pre-training task. We then show that using these pre-trained spectrum representations improves our performance on the four downstream tasks of spectrum quality prediction, chimericity prediction, phosphorylation prediction, and glycosylation status prediction. Finally, we perform multi-task fine-tuning and find that this approach improves the performance on each task individually. Overall, our work demonstrates that a foundation model for tandem mass spectrometry proteomics trained on de novo sequencing learns generalizable representations of spectra, improves performance on downstream tasks where training data is limited, and can ultimately enhance data acquisition and analysis in proteomics experiments. 
    more » « less
    Free, publicly-accessible full text available May 19, 2026
  3. Abstract A fundamental challenge in mass spectrometry-based proteomics is the identification of the peptide that generated each acquired tandem mass spectrum. Approaches that leverage known peptide sequence databases cannot detect unexpected peptides and can be impractical or impossible to apply in some settings. Thus, the ability to assign peptide sequences to tandem mass spectra without prior information—de novo peptide sequencing—is valuable for tasks including antibody sequencing, immunopeptidomics, and metaproteomics. Although many methods have been developed to address this problem, it remains an outstanding challenge in part due to the difficulty of modeling the irregular data structure of tandem mass spectra. Here, we describe Casanovo, a machine learning model that uses a transformer neural network architecture to translate the sequence of peaks in a tandem mass spectrum into the sequence of amino acids that comprise the generating peptide. We train a Casanovo model from 30 million labeled spectra and demonstrate that the model outperforms several state-of-the-art methods on a cross-species benchmark dataset. We also develop a version of Casanovo that is fine-tuned for non-enzymatic peptides. Finally, we demonstrate that Casanovo’s superior performance improves the analysis of immunopeptidomics and metaproteomics experiments and allows us to delve deeper into the dark proteome. 
    more » « less
  4. Free, publicly-accessible full text available December 15, 2025
  5. Background: Spectral library searching is currently the most common approach for compound annotation in untargeted metabolomics. Spectral libraries applicable to liquid chromatography mass spectrometry have grown in size over the past decade to include hundreds of thousands to millions of mass spectra and tens of thousands of compounds, forming an essential knowledge base for the interpretation of metabolomics experiments. Aim of Review: We describe existing spectral library resources, highlight different strategies for compiling spectral libraries, and discuss quality considerations that should be taken into account when interpreting spectral library searching results. Finally, we describe how spectral libraries are empowering the next generation of machine learning tools in computational metabolomics, and discuss several opportunities for using increasingly accessible large spectral libraries. Key Scientific Concepts of Review: This review focuses on the current state of spectral libraries for untargeted LC-MS/MS based metabolomics. We show how the number of entries in publicly accessible spectral libraries has increased more than 60-fold in the past eight years to aid molecular interpretation and we discuss how the role of spectral libraries in untargeted metabolomics will evolve in the near future. 
    more » « less
  6. Abstract MotivationDriven by technological advances, the throughput and cost of mass spectrometry (MS) proteomics experiments have improved by orders of magnitude in recent decades. Spectral library searching is a common approach to annotating experimental mass spectra by matching them against large libraries of reference spectra corresponding to known peptides. An important disadvantage, however, is that only peptides included in the spectral library can be found, whereas novel peptides, such as those with unexpected post-translational modifications (PTMs), will remain unknown. Open modification searching (OMS) is an increasingly popular approach to annotate modified peptides based on partial matches against their unmodified counterparts. Unfortunately, this leads to very large search spaces and excessive runtimes, which is especially problematic considering the continuously increasing sizes of MS proteomics datasets. ResultsWe propose an OMS algorithm, called HOMS-TC, that fully exploits parallelism in the entire pipeline of spectral library searching. We designed a new highly parallel encoding method based on the principle of hyperdimensional computing to encode mass spectral data to hypervectors while minimizing information loss. This process can be easily parallelized since each dimension is calculated independently. HOMS-TC processes two stages of existing cascade search in parallel and selects the most similar spectra while considering PTMs. We accelerate HOMS-TC on NVIDIA’s tensor core units, which is emerging and readily available in the recent graphics processing unit (GPU). Our evaluation shows that HOMS-TC is 31× faster on average than alternative search engines and provides comparable accuracy to competing search tools. Availability and implementationHOMS-TC is freely available under the Apache 2.0 license as an open-source software project at https://github.com/tycheyoung/homs-tc. 
    more » « less
  7. Free, publicly-accessible full text available May 12, 2026